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Abstract:   This  paper  studies  the  properties  of  the  quasi -maximum  likelihood 
estimator  (QMLE)  and  related  test  statistics  in  dynamic  models  that  jointly 
parameterize  conditional  means  and  conditional  covariances  when  a  normal  log 
likelihood  is  maximized  but  the  assumption  of  normality  is  violated.   Because 
the  score  of  the  normal  log  likelihood  has  the  martingale  difference  property 
under  fairly  general  regularity  conditions  provided  only  that  the  first  two 
conditional  moments  are  correctly  specified,  the  QMLE  is  generally  consistent 
and  has  a  limiting  normal  distribution.   Easily  computable  formulas  for 
asymptotic  standard  errors  that  are  valid  under  nonnormality  are  also 
available.   Further,  we  show  how  robust  LM  tests  for  the  adequacy  of  the 
jointly  parameterized  mean  and  variance  can  be  computed  from  simple  auxiliary 
regressions.   An  appealing  feature  of  these  robust  inference  procedures  is 
that  only  first  derivatives  of  the  conditional  mean  and  variance  functions 
are  called  for.   In  addition,  the  robust  tests  lose  nothing  in  terms  of 
asymptotic  local  power  if  the  normality  assumption  is  true.   A  Monte  Carlo 
study  indicates  that  the  asymptotic  results  carry  over  to  finite  samples. 
Estimation  of  several  AR  and  AR-GARCH  time  series  models  reveals  that  in  most 
situations  the  robust  form  of  the  test  statistics  compare  favorably  to  the 
two  standard  (nonrobust)  formulations  of  the  Wald  and  LM  tests.   Also,  for 
the  GARCH  models  and  the  sample  sizes  analyzed  here,  the  bias  in  the  exact 
MLE  or  the  QMLE  appear  to  be  relatively  small,  and  typically  there  is  only 
minor  loss  in  efficiency  for  the  parameters  in  the  conditional  mean  from  not 
modelling  the  heteroskedasticity. 


1 .  Introduction 

Dynamic  econometric  models  that  jointly  parameterize  conditional  means, 
conditional  variances,  and  conditional  covariances  are  becoming  increasingly 
popular  in  the  analysis  of  economic  time  series.   Engle's  (1982a)  pioneering 
autoregressive  conditional  heteroskedasticity  (ARCH)  model  has  been  expanded 
and  adapted  for  application  in  several  diverse  fields.   One  useful  extension 
is  the  generalized  ARCH  (GARCH)  model  introduced  by  Bollerslev  (1986)  ,  which 
allows  for  richer  dynamics  in  the  conditional  second  moments .   The 
ARCH- in-mean  (ARCH-M)  model  introduced  by  Engle ,  Lilien,  and  Robins  (1987) 
has  been  successfully  applied  in  financial  economics  to  both  univariate  and 
multivariate  dynamic  asset  pricing  models,  where  conditional  mean  equations 
that  contain  conditional  second  moments  arise  naturally  from  considerations 
of  attitudes  toward  risk.   For  empirical  applications  of  the  ARCH-M  model  see 
Engle,  Lilien,  and  Robins  (1987),  Domowitz  and  Hakkio  (1985),  and  Bollerslev, 
Engle,  and  Wooldridge  (1988). 

As  demonstrated  by  Pagan  and  Ullah  (1988) ,  certain  hypotheses  involving 
risk  measures  can  be  tested  by  means  of  instrumental  variables  (IV) 
estimation  without  explicitly  parameterizing  the  relevant  conditional 
variances  and  covariances.   Unfortunately,  the  IV  approach  is  not  very 
helpful  when  interest  lies  in  obtaining  estimates  of  the  risk  premia  because 
the  premia  depend  directly  on  the  conditional  second  moments .   A  further 
limitation  of  the  IV  approach  is  that  under  data  generating  mechanisms  such 
as  the  ARCH-M  model  evidently  no  instrumental  variables  estimators  exist  that 
do  not  require  a  priori   knowledge  of  the  ARCH-M  structure  (Pagan  and  Ullah 
(1988,  p.  99)).   For  these  reasons,  studies  that  jointly  estimate  dynamic 


conditional  means  and  conditional  second  moments  have  relied  heavily  on 
maximum  likelihood  procedures,  frequently  under  the  assumption  of  conditional 
normality. 

Taken  literally,  the  assumption  of  conditional  normality  is  quite 
restrictive  for  many  economic  purposes.   The  symmetry  imposed  under  normality 
is  difficult  to  justify  in  general,  but  perhaps  even  more  importantly, 
reseachers  who  have  attempted  to  forecast  economic  variables  might  concur 
that  the  tails  of  even  conditional  distributions  often  seem  to  be  fatter  than 
that  of  the  normal  distribution.   The  extensive  use  of  maximum  likelihood 
under  the  assumption  of  normality  is  almost  certainly  due  to  its  relative 
simplicity  and  the  widespread  familiarity  with  the  properties  of  the  maximum 
likelihood  estimator  (MLE)  under  ideal  conditions. 

Because  maximum  likelihood  under  normality  is  so  widely  used,  it  is 
important  to  investigate  its  properties  in  a  setting  general  enough  to 
include  most  cases  of  interest  to  applied  researchers.   The  purpose  of  this 
paper  is  to  study  the  behavior  of  the  quasi -maximum  likelihood  estimator 
(QMLE)  and  related  test  statistics  in  a  general  class  of  dynamic  models  when 
a  normal  log  likelihood  is  maximized  but  the  normality  assumption  is 
violated.   An  important  conclusion,  developed  in  section  2,  is  that  the  QMLE 
is  still  consistent  for  the  parameters  of  the  jointly  parameterized 
conditional  mean  and  conditional  variance.   In  section  2  we  also  derive 
easily  computable  formulas  for  the  asymptotic  standard  errors  that  are  valid 
•under  nonnormality .   These  formulas  facilitate  the  construction  of 
computationally  simple  Wald  statistics  that  are  valid  under  nonnormality,  yet 
still  optimal  under  normality.   Section  3  derives  robust,  regression-based 
Lagrange  Multiplier  (LM)  diagnostics  that  can  be  used  to  check  the  adequacy 


of  the  specification  of  the  first  two  conditional  moments.   Taken  together, 
sections  2  and  3  contain  the  results  and  formulas  needed  to  conduct  inference 
about  and  specification  analysis  of  dynamic  conditional  means  and  second 
moments  while  being  robust  to  nonnormality .   An  appealing  feature  of  these 
results  is  that  only  first  derivatives  of  the  mean  and  variance  functions  are 
needed  to  compute  all  of  the  statistics  considered  here,  including  the  robust 
covariance  matrix  of  the  QMLE,  the  robust  Wald  statistic,  and  the  robust  LM 
statistic . 

Section  4  presents  some  Monte  Carlo  evidence  on  the  performance  of  both 
the  robust  tests  and  the  more  popular  nonrobust  tests.   Broadly  speaking,  the 
results  for  the  robust  statistics  are  quite  encouraging,  and  suggest  that 
they  can  be  profitably  applied  in  empirical  studies. 

2 .  Consistency  and  Asymptotic  Normality  of  the  QMLE 

Let  { (y  ,  z  ):  t=l,2,...)  be  a  sequence  of  observable  random  vectors  with 

y   1  x  K,  z   1  x  L.   The  vector  y  contains  the  "endogenous"  variables  and  z 

contains  contemporaneous  "exogenous"  (conditioning)  variables.   Let  x  b 

(z^,y^  ,,z   -,...,y  z.)  denote  the  1  x  L+(t-l)(L+K)  vector  of  predetermined 
-   c-i   t-1      1   i 

variables.   The  purpose  of  the  analysis  is  to  estimate  and  test  hypotheses 
about  the  conditional  expectation  and  conditional  variance  of  y  given  the 
predetermined  variables  x^.   If  one  wants  to  condition  only  on  information 
observed  before  t,  z   can  be  excluded  from  x_  without  altering  any  of  the 
subsequent  analysis.   Incidentally,  cross  section  analysis  is  accomodated  by 
setting  x  =  z__  and  typically  assuming  that  the  observations  are 
independently  distributed.   In  what  follows,  set  Y  =  (y  ,y  - , . . . ,y  )  and  Z 
e  (zT, . . . ,z  )  . 


The  conditional  mean  and  variance  functions  are  jointly  parameterized  by 
a  finite  dimensional  vector  8: 

{/it(xt,0>:  6  e  9) 

{Qt(xt,0):  See), 

p 
where  6  is  a  subset  of  R  and  u  and  Q     are  known  functions  of  x   and  6 .      In 

*t      t  t 

the  subsequent  analysis,  the  validity  of  most  of  the  inference  procedures  is 
explicitly  proven  under  the  null  hypothesis  that  the  first  two  conditional 
moments  are  correctly  specified.   More  formally,  there  is  some  6      e  8  such 
that 

(2.1)(a)       E(yt|xt)  -  f^t(\JQ) 

(2.1)(b)       V(yt|xt)  -  Ot(xt,0o),     t-1,2 

Sometimes  one  is  interested  in  testing  (2.1. a)  while  being  robust  to 

departures  from  (2.1.b).   In  section  3  we  briefly  discuss  how  to  compute 

conditional  mean  statistics  that  are  robust  to  violation  of  (2.1.b).   This 

makes  sense  only  when  it  is  possible  to  separate  the  conditional  mean  and 

variance  functions  in  an  appropriate  sense.   Also,  in  some  of  the  simulations 

in  section  4  we  impose  only  (2.1. a)  under  the  null  hypothesis  and  investigate 

the  robustness  of  various  statistics  to  departures  from  (2.1.b)  as  well  as  to 

departures  from  normality. 

The  procedure  most  often  used  to  estimate  6      is  maximization  of  a 

o 

likelihood  function  that  is  constructed  under  the  assumption  that  the 

conditional  distribution  of  v  given  x  is  normal  with  mean  and  variance 

-  t  6      t 

given  by  (2.1).   This  is  the  approach  taken  here  as  well;  however,  as 
mentioned  in  the  introduction,  the  subsequent  analysis  does  not  assume  that 
y^  has  a  conditional  normal  distribution.   Nevertheless,  as  stated  formally 


in  Theorem  2.1  below,  the  resulting  quasi  maximum  likelihood  estimator  (QMLE) 

is  generally  consistent  for  8      and  under  standard  regularity  conditions  it  is 

asymptotically  normally  distributed. 

Rather  than  employing  quasi-maximum  likelihood  to  estimate  8    ,    it  is 

straightforward  to  use  (2.1)  to  construct  a  variety  of  generalized  method  of 

moments  (GMM)  estimators  for  8    .      The  results  of  White  (1982b)  and  Cragg 

o 

(1983)  even  suggest  that  judicious  choice  of  the  instrumental  variables  could 

yield  estimators  asymptotically  more  efficient  than  the  QMLE  under 

nonnormality .   To  our  knowledge,  for  the  general  class  of  models  considered 

here,  the  analytical  results  underlying  the  selection  of  optimal  instruments 

are  not  yet  in  place.   Any  optimal  instruments  will  certainly  depend  on  8    , 

so  that  construction  of  a  GMM  estimator  more  efficient  than  QMLE  requires  an 

initial  consistent  estimator  of  8    .      In  addition,  the  researcher  must  make 

o 

judgements  on  the  number  and  type  of  overidentifying  orthogonality 
conditions,  and  the  finite  sample  performance  of  GMM  estimators  depends 
crucially  on  these  choices.   For  these  reasons,  and  because  of  the 
undisputable  popularity  of  maximum  likelihood  procedures,  we  focus  on  the 
properties  of  the  QMLE. 

For  observation  t,  the  quasi-conditional  log  likelihood  apart  from  a 
constant  is 

(2.2)  lA0;y    ,x)   «  -1/2  log|0  (x.,0)| 

t_        t     t  £    t 

-  1/2  (y  -  ai  (x.  ,*))o"1(x.,o<y-.  -  vAx+.W . 

Letting  £,_(y_,x_,  fl)  =  y   -  p^(x^,£)  denote  the  1  x  K  residual  function  and 

suppressing  the  dependence  of  e   and  Q  on  x   and  y_  yields  the  more  concise 

t      t     t      t 

expression 


(2.3)  lt(0)  =  -1/2  log|Ot(«)|  -  1/2  et(e)Q'tl(8)et(6)'  . 

A 

The  QMLE  8      is  obtained  by  maximizing  the  quasi-log  likelihood  function 

T 

(2.4)  L  (6)    =  I  1.(0). 

1     t-1 

If  n    (x  ;•)  and  Q    (x  ;•)  are  differentiable  on  6  for  all  relevant  x  ,  and 
Q  (x  ,8)    is  nonsingular  with  P  -probability  one  for  all  6   €  6,  then 

t   t  '  8 

differentiation  of  (2.3)  yields  the  lxP  score  function  s  (8): 

(2.5)  st(0)'  -  Vt(*}'  "  yt(J)'Q^(«)ft(J)' 

+  1/2  V  fl  (0)'  [fl"1^)  ®  0"1(«)]vec[e.(*)'ef.(«)  -  a    (6)]. 
f  t      t        t         t     t       t 

2 

where  V  p  (0)  is  the  KxP  gradient  of  fi     and  V  0  (0)  is  the  K  xP  gradient  of 
p  t  t       8    t 

Q  (0);  see  appendix  A  for  the  definition  of  the  derivative  of  a  matrix. 

If  (2.1. a)  holds  then  the  true  residual  vector  is  defined  as  e      ■  «  (0  ) 

-  y   -  /j  (x  ,8    )  and  E(e°|x  )  -  0.   If  in  addition  (2.1.b)  holds  then 
Jt   ^tv  t   oy       v  t1  t 

E(£°'£°[x  )  -  V(£°|x  )  -0  (x  J).   It  follows  immediately  from  (2.5)  that 
t   t1  t'     v  t1  t'     tv  t  oJ  J 

under  correct  specification  of  the  first  two  conditional  moments  of  y   given 

t 

(2.6)  E[s  (0)|x]=O 

L  tv  o/ '  f 

An  immediate  implication  of  (2.6)  is  that  the  score  evaluated  at  the  true 
parameter  is  a  vector  martingale  difference  sequence  with  respect  to  the 
a-fields  {a(y  ,x  ):  t=l,2,...}.   This  property  of  the  score  of  the 
conditional  log  likelihood  is  well  known  when  the  conditional  density  is 
correctly  specified;  see,  for  example,  Crowder  (1976),  Basawa,  Feigen  and 
Heyde  (1976),  and  Keijmans  and  Magnus  (1986).   The  above  analysis 
demonstrates  that  the  score  of  a  normal  log  likelihood  has  the  martingale 


difference  property  provided  only  that  the  first  two  moments  are  correctly 
specified.   This  result  extends  that  of  Weiss  (1986)  who  considers  a 
univariate  ARMA  model  with  ARCH  errors.   In  cross  section  settings,  MaCurdy 
(1981)  and  Gourieroux,  Monfort  and  Trognon  (1984)  have  shown  that  the  score 
evaluated  at  the  true  parameter  has  zero  expectation  without  the  normality 
assumption;  (2.6)  is  the  extension  to  general  dynamic  models. 

In  related  work,  Pagan  and  Sabau  (1987)  examine  the  robustness  of  the 
QMLE  in  a  univariate  linear  model  with  conditional  heteroskedasticity . 
However,  their  focus  is  on  the  consistency  of  the  conditional  mean  parameters 
when  the  conditional  variance  is  misspecif ied.   They  are  not  concerned  with 
consistency  of  the  mean  and  variance  parameters  under  nonnormality ,  nor  do 
they  present  limiting  distribution  results.   As  discussed  below,  their 
results  can  be  profitably  applied  in  conjunction  with  our  own. 

Because  equation  (2.6)  holds  for  any  value  of  the  true  parameter,  the 
QMLE  is  Fisher-consistent,  i.e. 


E^[ST(0;YT,ZT)]  -  0        for  all  8   e  9 


where 


ST(*;YT,ZT)  -  I   st(0;yt,xt) 

and  E  ( • )  denotes  the  expectation  operator  when  6    indexes  the  conditional 

mean  and  variance  of  y  .   Under  appropriate  regularity  conditions, 

t 

Fisher- consistency  implies  weak  consistency  of  the  QMLE.   The  approach  of 
Wooldridge  (1986)  can  be  used  to  establish  asymptotic  normality  when  an 
estimator  is  Fisher-consistent.   We  do  not  pursue  this  here  because  the 
regularity  conditions  are  more  involved  than  the  scope  of  the  current  paper 
warrants.   The  proof  of  weak  consistency  in  the  appendix  does  not  directly 


exploit  (2.6),  but  instead  adopts  the  uniform  law  of  large  number  approach  of 

Domowitz  and  White  (1982).   Identif iability  of  0   is  established  by  showing 

that  8      maximizes  E[Lw,(0)].   The  proof  of  asymptotic  normality  of  the  QMLE  in 

the  appendix  does  rely  directly  on  (2.6)  since  a  martingale  central  limit 

theorem  is  applied  to  (s  (0  )}. 

For  robust  inference  we  also  need  an  expression  for  the  hessian  h  (0)  of 

1  (0) .   Actually,  for  computations,  it  is  useful  to  observe  that  when  (2.1) 

holds  all  that  is  needed  is  E[h  (0  ) |x  ].   This  matrix  has  a  very  convenient 

form  that  involves  only  first  derivatives  of  the  conditional  mean  and 

variance  functions.   Letting  a  (0  )  =  -E[V  s  (0  )|x  ]  ■=  E[-h  (0  )|x  ],  it  is 

to  0tot  tot 

straightforward   to   show  that  under    (2.1. a)    and    (2.1.b), 

(2.7)  at(0o)  -  v^V'q;1^)  Vt(V 

+   1/2  v^t(0o)Mn;1(0o)  ®Q;1(0o)]v^t(0o) 

(see  Kroner  (1987,  Lemma  1)  for  derivation  of  a  similar  result  under 

normality  without  the  conditional  mean  parameters).   When  the  normality 

assumption  holds  the  P  x  P  matrix  a  (0  )  is  the  conditional  information 
r  t  o 

matrix.   However,  if  y   does  not  have  a  conditional  normal  distribution  then 

V[s  (0  ) |x  '  is  generallv  not  equal  to  a  (0  )  and  the  information  matrix 
to't     c       "       ■        to 

equality  is  violated.   Nevertheless,  it  is  fairly  easy  to  carry  out  inference 
about  6    .      The  proof  of  following  theorem  is  provided  in  appendix  A. 


THEOREM  2.1:   Suppose  that  the  following  conditions  hold: 

(i)  Regularity  conditions  A.l  in  the  appendix; 

(ii)  For  some  6      G  int  6  and  t—1,2,..., 
o 


and 


E(y  |x  )  -  «  <x  8    ) 
t   t     t   t  o 


V(y  |x  )  -  n  (x  ,6    ) 
VJt'  t     tv  t*  oy 


Then 


where 


t1/2(Jt  -  eo)   5  nco.a^V^"1) 


T  T 

A^  =  E[-H  (ff  )/T]  -  -T"1  I   E(h  (0  ))  -  T*1!  E(a  (0  )) 

t=l  t-=l 

T 
B°  -  V[T"1/2ST(^o)]  -  T"1  X  E(st(»o)'St(flo)). 

In  addition, 

K-4  ?  ° 

B   -  B°   B   0 
T    T 

where 

A  .  i  A 

A^  -  T"x£  a  (O 
t-1 

A  i     i.  A  A 

B  -  T"^  X  s  (0  )'s  (6    ). 
t-1 

A  A 

The  estimators  A_  and  B  have  the  convenient  property  of  being  at  least 
positive  semi-definite  and  usually  positive  definite.   Moreover,  they  are 

A  A 

computable  entirely  from  residuals  c^,  the  mean  and  variance  functions  (i    , 

A  A 

and  0  ,  and  the  first  derivatives  of  the  mean  and  variance  functions  V  p^   and 

t  U        L. 

A 

V  Q^ .   Thus,  they  do  not  require  second  derivatives  of  either  the  mean  or 


A    m     A    A    * 

variance  functions.   The  matrix  A,^  ^t^t   ^"s  a  consi-stent  estimator  of  the 

1/2  A 
White  (1982a)  robust  asymptotic  variance  matrix  of  T    (8      -    6    )  that  has 

been  obtained  using  only  first  derivatives.   Estimation  of  these  models 

typically  utilizes  numerical  approximations  to  the  analytical  derivatives. 

Theorem  2.1  demonstrates  that  robust  inference  can  be  carried  out  without 

resorting  to  numerical  second  derivatives,  which  are  likely  to  be  numerically 

unstable  and  are  not  guaranteed  to  yield  a  negative  definite  Hessian. 

With  Theorem  2.1  in  place  it  is  straightforward  to  construct  Wald 

statistics  for  testing  hypotheses  about  6    .   Assume  that  the  null  hypothesis 

can  be  stated  as 

HQ:  c(*o)  -  0 

where  c:6  -*  R  is  continuously  differentiable  on  int  9  and  Q  <  P.  Let  C(S)  = 
V  c(0)  be  the  Q  x  P  gradient  of  c  on  int  6.  If  6  e  int  6  and  rank  C(0  )  =  Q 
then,  under  the  conditions  of  Theorem  2.1,  the  Wald  statistic 

A         A   A   i  A   A   ■*  A       ■*      A 

(2.8)  WT   -  Tc(*T)'  tCTA^iBTA^iC^]"'Lc(»T) 

ry  A  A 

has  an  asymptotic  xn   distribution  under  H   where  C  ■  C(&    ).   Again,  we 
emphasize  that  the  robust  Wald  statistic  is  computable  entirely  from  first 
derivatives  and  has  an  asymptotic  chi- square  distribution  whether  or  not  the 

A    A    -  A    A    .  A 

conditional  normality  assumDtion  holds.   Moreoever,  the  matrix  C^A^  B^A^^C^, 

j.  I      i  i   i 

is  at  least  positive  semi-definite  and  usually  positive  definite.   Wald  tests 
constructed  from  either  the  inverse  of  the  Hessian  or  the  the  outer  product 
of  the  gradient  will  not  generally  lead  to  valid  inference. 
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3 .  Robust  Lapranpe  Multiplier  Tests 

Because  estimation  of  the  models  considered  in  this  paper  can  be 

computationally  difficult,  it  is  useful  to  have  diagnostics  that  are 

computable  from  a  constrained  model.   In  this  section  we  derive  a  robust  form 

of  the  Lagrange  Multiplier  (LM)  or  efficient  score  statistic  that  is 

computable  from  statistics  obtained  after  a  single  iteration  away  from  the 

restricted  model.   Assume  that  the  hypothesis  of  interest 

H  :  c(8    )   -  0 
(J     o 

can  be  expressed  in  terms  of  8      as 

o 

H_ :  8      =   r(a  )        for  some  a  6  A, 
0    o      o  o 

M 
where  A  C  R  and  M  ^  P  -  Q.   The  function  r:  A  -+  6  is  assumed  to  be 

continuously  dif ferentiable  on  the  interior  of  A,  and  a     e  int  A.   Note  that 

for  the  LM  test  we  require  only  that  a     be  in  the  interior  of  its  parameter 

space;  6      is  allowed  to  be  on  the  boundary  of  6.   This  is  especially  useful 

in  the  present  context  where  hypotheses  concerning  the  conditional  variances 

and  covariances  of  the  process  necessarily  impose  nonnegativity  restrictions. 

Let  R(a)  ■  V  r(a)  be  the  P  x  M  gradient  of  r. 

The  LM  test  is  based  on  the  gradient  of  the  log  likelihood  evaluated  at 

the  constrained  QMLE .   Let  the  constrained  quantities  be 

l*(a)  -  lt(r(a)) 

l£(o)  -  LT(r(a)) 


s    (a)    -  V  1(b)  -  s^(r(a))R(a). 

t       a  t       t 


If  a  ,  defined  to  be  a  solution  to 


* 

max 


QGA 
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exists  then  the  constrained  QMLE  of  6      is  0_  ■  r(a_) .   The  LM  statistic  is  a 

o     1      I 

quadratic  form  in  the  P  x  1  vector 

t-1/2st(?v  -  t1/21  %rty  ^  j-l/2l  J' 


Tv  T' 


t-1 


tv  T' 


t-1 


Under  conditional  normality,  the  outer  product  of  the  gradient  (OPG)  LM 

2 
statistic  is  obtained  as  TR  from  the  outer  product  regression 


(3.6) 


1   on   st,     t-1 T, 


2  2    2 

where  R   is  the  uncentered  r- squared.   Under  conditional  normality.  TR  ~  y_. 
u  u    Q 

If  the  conditional  distribution  of  y  given  x   is  not  normal  then  the 

Jt   6      t 

2 
limiting  distribution  is  generally  not  v-,i  and  the  nominal  size  can  be  very 

different  from  the  actual  size.   The  OPG  LM  statistic  can  also  have  poor 

finite  sample  properties  even  under  normality  (see  Davidson  and  MacKinnon 

(1984)  and  section  4).   Other  forms  of  the  LM  statistic,  in  particular  those 

based  on  generalized  residuals,  have  better  finite  sample  properties  under 

normality  but  are  still  invalid  under  nonnormality.   The  power  of  the 

nonrobust  test  statistics  for  alternatives  to  the  mean  and  variance  can  also 

be  adversely  affected  if  normality  does  not  hold. 

To  derive  a  test  of  H_  which  is  robust  to  nonnormality,  we  extend  the 

univariate  case  considered  in  Wooidridge  (1988,  Example  3.3).   First,  express 

the  (unrestricted)  score  in  (2.5)  as 


(3.7) 


sAB)'    = 


Vt(*} 


n'V) 


o  [a    (6)@n 


f}m}/2  J 


[  vec[ct(f)'et<*)  -  Qt(6)]   J 
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Evaluating  s   at  r(a)  yields  the  score  with  the  restrictions  imposed: 


(3.8) 


»t(r(o))' 


V^t(r(a)) 
V^nt(r(a)) 


n^(r(a)) 
0 


[n^(r(a))  ®  n^(r(a))]/2  J 


€t(r(a)) 


.vec[et(r(tt))'et(r(a))  -  Qt(r(a))] 


-  At(a)Tt(Q)   r?t(a) 

where  A  (a)  is  K+K  x  P,  T  (a)  is  K+K   x  K+K  ,  and  rj    (a)  is  K+K  x  1.   Note 

that  r)    (a)  is  a  vector  of  generalized  residuals;  in  particular,  E[r?  (a  )|x  ] 

=  0  under  H    Let  m  (a)  =  fj,    (r(a))  and  W  (a)  ■  0  (r(a))  be  the  restricted 
mean  and  variance  functions,  respectively,  with  gradients 

V  m  (a)   =  V  a   (r(a))R(a) 

a    z  v    t 

VQWt(a)  -  V^nt(r(Q))R(a). 

2 

Note  that  V  m  (a)  is  KxM  and  V  W  (q)  is  K  xM.   It  is  convenient  to  stack 
Q  tv  '  a   tv  ' 

2 

these  gradients  into  the  K+K  x  M  matrix  tf  (a)  : 


*t(a) 


V  m  (a) 

a  t  ' 

V  V  (a) 

a  t  ' 


The  restricted  residual  function  is  e  (a)  =  e<_(r(a)).   Finally,  let  values 

labelled  with  tilde  be  evaluated  at  a        for  example,  A_  =  A  (qt),  T^   = 

I  t    t   a    t 

T  (a  )  ,  77   =  r\ t(o_)  i  and  f  =  *  (a_) .   Then  the  robust  LM  test  can  be 
computed  from  Wooldridge  (1988,  Theorem  2.1): 
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PROCEDURE  3.1: 

(i)  Compute  aT,    ^    V^    e^,  V^,  V^,  V^,  and  V^t . 
(ii)  Run  the  matrix  regression 

1/2—  1/2— 

rt7At  on  rt'*t    t-l,2,...,T 

and  save  the  residuals,  say  A  . 

(iii)  Run  the  OLS  regression 

1   on   r?'A     t=l T, 

't  t 

1/2— 

where  rj      ^   T         r\    ,  t-l,...,T,  are  the  weighted  generalized  residuals.   Under 

2  2 

Hft ,  use  TR   as  asymptotically  xn- 

This  form  of  the  LM  statistic  has  several  attractive  features.   Firstly, 
just  as  with  the  robust  Wald  statistic,  the  procedure  is  valid  under 
nonnormality  and  loses  nothing  in  terms  of  asymptotic  local  power  if  the 
normality  assumption  happens  to  hold.   Secondly,  it  requires  only  the 
estimates  from  the  restricted  model,  and  there  is  no  need  to  solve  for  the 
implicit  constraint  function  or  its  gradient.   Thirdly,  only  first 
derivatives  of  the  conditional  mean  and  variance  functions  (evaluated  at  the 
restricted  estimates)  are  needed  for  the  computations.   Finally,  as  discussed 
in  Wooldridge  (1988)  ,  this  form  of  the  LM  statistic  can  be  computed  using  anv 
v'T- consistent  estimator  for  a  without  any  loss  in  efficiency  under 
normality. 

A  useful  feature  of  both  the  robust  Wald  and  LM  tests  is  that,  when  the 
mean  and  variance  can  be  appropriately  separated,  allowing  for  consistent 
estimation  of  the  mean  parameters  under  nonnormality  and  violation  of 
(2.1.b),  then  the  conditional  mean  tests  can  be  robust  to  misspecif ication  of 
V(y«_|x  ).   In  the  Wald  case  it  is  straightforward  to  construct  conditional 
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mean  tests  by  focusing  only  on  elements  of  8    that  index  the  conditional  mean. 
In  the  case  of  the  LM  statistics,  one  simply  sets  r\    (a)  ^  e  (a)  «■  e    (r(a)), 
T  (a)  ^  fi  (r(a)),  *  (a)  =  m  (a),  and  A  (a)  e  p.    (r(a)).   It  is  important  to 
stress  that  consistency  of  the  QMLE  for  the  conditional  mean  parameters  is 
not  alone  sufficient  for  the  asymptotic  covariance  formula  for  the  mean 


A    *  A 


■1     -1 

parameters,  given  by  the  appropriate  block  of  A,^  ^r^T  ,  to  be  valid.   If  J  = 
(q,^9)  and  a  represents  the  parameters  of  the  conditional  mean  then 

A 

what  is  needed  is  that  the  asymptotic  distribution  of  Jl(a      -   a   )  be 
independent  of  the  asymptotic  distribution  of  jT(fi      -    ft    ) ,  where  fi      is  the 

A 

plim  of  the  QMLE  ft      under  (2.1. a).   Consequently,  it  is  not  true  that  the 
conditions  of  Pagan  and  Sabau's  (1987)  Theorem  5  are  sufficient  for  the 
robust  Wald  and  LM  statistics  to  be  valid  for,  say,  a  misspecified  ARCH 
model.  However,  if  the  model  for  V(y  |x  )  does  not  depend  on  a   then  the  Wald 
and  LM  tests  for  E(y  |x  )  are  robust  to  violations  of  (2.1.b)  (and 
normality) . 

In  certain  cases,  such  as  ARCH-M  models  or  Amemiya's  (1973)  model  of 
heteroskedasticity ,  misspecif ication  of  (2.1.b)  leads  to  inconsistency  of  all 
elements  in  8    (if  a  normal  likelhood  function  is  maximized).   By  the  nature 
of  these  models  most  hypotheses  are  joint  hypotheses  about  the  the 
conditional  mean  and  conditional  variance,  and  then  robustness  to  conditional 
variance  misspecif ication  does  not  make  much  sense.   Robustness  to 
nonnormality  is  obtained  by  applying  Procedure  3.1. 

The  OPG  LM  test  has  no  systematic  power  for  detecting  nonnormality;  it 
can  lead  to  inference  with  the  wrong  asymptotic  size  without  the  benefit  of 
being  able  to  detect  nonnormality  with  any  regularity.   The  robust  (RB)  LM 
procedure  outlined  above  is  asymptotically  equivalent  to  the  OPG  LM  procedure 


15 


under  normality  and,  based  on  the  Monte  Carlo  evidence  in  section  4,  the 
robust  procedure  is  typically  much  better  in  finite  samples  than  the  OPG  LM 
statistic . 

As  mentioned  above  a  third  possibility  for  an  LM  statistic  is  based  on 

— 1/2"- 

the  weighted  generalized  residuals  T        ij  .   Under  conditional  normality  it 

can  be  shown  that  E(»  T  "  *?°|x  )  ■=  K+K(K+l)/2.   This  fact  can  be  used  to 

2 
show  that  [K+K(K+1)/2]TR   from  the  regression 

(3.9)  Ft1/2''t   °n  Ft1/2\'   t-1-----1- 

2 
is  asymptotically  v  under  H_  and  normality,  where  the  regression  is  carried 

out  by  stacking  the  observations  and  using  OLS  (see  Engle  (1982b)).   Because 

this  statistic  employs  an  estimate  of  the  Hessian  as  the  estimated 

information  matrix,  we  subsequently  call  it  the  HE  LM  statistic.   As 

evidenced  by  the  simulations  in  section  4,  it  is  typically  better  behaved  in 

finite  samples  than  the  OPG  LM  statistic.   Nonetheless,  as  mentioned 

previously,  it  is  not  asymptotically  robust  to  nonnormality.   This  also  is 

born  out  by  the  simulations  in  section  4.   If  one  is  computing  conditional 

mean  tests,  then  r?   =e,T  =  W  ,  A  ^  V  .u    ,  and  then  the  statistic  is  KTR  . 
't     t'   t     t'   t     rt'  u 

But  this  test  is  not  robust  to  second  moment  misspecif ication.   If  rj 

consists  only  of  conditional  variances  and  covariances  then  the  statistic  is 

[K(K+1)/2]TR2. 
u 

Procedure  3.1  is  computationally  somewhat  more  difficult  than  (3.6)  or 
(3.9),  but  not  by  much.   It  requires  exactly  the  same  quantities  used  in 
computing  the  HE  LM  statistic  and  in  implementing  efficient  computational 
algorithms.   Our  view  is  that  the  additional  computational  burden  embodied  in 
the  matrix  regression  of  step  (ii)  is  warranted  in  many  situations.   Unless 
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normality  is  a  maintained  assumption,  one  can  never  be  sure  at  what 
asymptotic  size  a  test  is  being  conducted  unless  the  robust  form  is  used. 
The  widespread  use  of  tests  for  nonnormality  in  the  finance  literature 
suggests  that  many  researchers  are  not  willing  to  adopt  normality  as  a 
maintained  assumption. 

A  useful  extension  of  Procedure  3.1,  which  allows  for  a  variety  of  other 
specification  tests,  is  available  from  the  results  of  Wooldridge  (1988). 
There  is  no  need  to  focus  on  tests  that  can  be  derived  from  nesting  models. 
The  matrix  of  unrestricted  gradients  evaluated  at  the  restricted  estimates, 


A 

t 


f  Vt  l 


Vt 


can  be  replaced  in  step  (ii)  of  the  robust  LM  procedure  by  essentially  any 
function  of  x  ,  a        and  other  nuisance  parameter  estimates,  say  -k    ,  such  that 
JT(tt      -    jr_)  =  0  (1)  for  some  nonstochastic  sequence  {n    )  .      This  extension 
allows  for  robust,  regression-based  nonnested  hypothesis  testing  -  in  which 
case  7r  would  be  estimates  from  a  competing  model  -  as  well  as  many  other 
useful  diagnostics.   For  example,  the  diagnostics  employed  by  Bollerslev, 
Engle ,  and  Wooldridge  (1988)  for  evaluating  a  dynamic  capital  asset  pricing 
model  (CAPM) ,  which  involve  conducting  LM  tests  for  exclusion  of  fitted 
values  from  competing  models,  can  easily  be  "robustified"  within  this 
framework. 
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4.  Simulation  Experiments 

In  order  to  investigate  the  finite  sample  performance  and  applicability 
of  the  robust  inference  procedures  discussed  in  sections  2  and  3,  several 
simulation  experiments  were  performed.   To  facilitate  the  presentation,  all 
of  the  simulated  models  are  nested  within  the  AR(2) -GARCH(1 , 2)  model, 

yt  -  ^y^  +  *2yt_2  +  «t 

(4-1}    wt  "  6  +  Vt-l  +  Vt-2  +  h\-l  t_1 T 

£t  -WA'   et  i-i-d-  \ 

where  t   denotes  a  standard  t-distribution  with  v   decrees  of  freedom.   The 
autoregressive  conditional  heteroskedasticity  (ARCH)  model  was  originally 
introduced  by  Engle  (1982a)  and  later  extended  to  the  generalized  ARCH 
(GARCH)  model  in  Bollerslev  (1986).   For  ease  of  exposition  the  "o"  that  has 
been  used  heretofore  to  index  true  parameters  is  omitted  in  (4.1). 

Numerous  applications  of  the  ARCH  and  GARCH  models  have  already 
appeared.   In  particular,  several  studies  have  employed  the  ARCH  methodology 
in  characterizing  the  distributional  properties  of  financial  and  monetary 
time  series  data  where  both  volatility  clustering  and  leptokurtosis  have  a 
long  history  as  salient  empirical  regularities.   A  selective  but  far  from 
exhaustive  list  of  some  of  these  earlier  references  are  given  in  Engle  and 
Bollerslev  (1986).   Although  the  ARCH  model  with  conditional  normal  errors 
has  an  unconditional  leptokurtic  distribution  -  see  Milbfj  (1985)  -  it  is 
often  found  that  this  model  does  not  sufficiently  account  for  the 
leptokurtosis  present  in  high  frequency  financial  data.   To  that  end 
Bollerslev  (1987)  suggested  the  use  of  a  conditional  t-distribution  as  in 
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model  (4.1)  above.   Subsequently,  this  model  has  been  estimated  by  Baillie 

and  Bollerslev  (1987),  Baillie  and  DeGennaro  (1988),  Engle  and  Bollerslev 

(1986),  Hsieh  (1988),  and  Schwert  and  Seguin  (1988),  among  others. 

The  derivatives  for  the  conditional  mean  and  the  conditional  variance 

functions  are  given  in  appendix  B,  and  under  the  assumption  of  of  conditional 

normal  innovations,  i.e.  v   -  <»,  the  score  and  the  information  matrix  for  the 

log  likelihood  function  follow  by  direct  substitution  from  (2.5)  and  (2.7). 

Although  the  assumption  of  conditional  normality  might  be  violated,  the 

results  in  section  2  allow  asymptotic  valid  inference  about  the  true 

parameter  vector,  8    ,  to  be  carried  out  from  (2.8)  based  on  the  QMLE  obtained 

under  the  conditional  normality  assumption,  8  The  use  of  this  robust  form 

of  the  Wald  test  when  conducting  inference  in  ARCH  models  has  been  previously 

suggested  by  Weiss  (1984,1986);  however,  no  evidence  on  the  small  sample 

performance  of  the  procedure  is  yet  available.   Robust  LM  tests  for 

hypotheses  about  6      based  on  the  constrained  QMLE  of  8      can  be  calculated 
J  o  o 

from  the  regressions  outlined  in  Theorem  3.1.   The  finite  sample  properties 
of  these  LM  tests  under  nonnormality  are  also  unknown. 

The  first  set  of  experiments  designed  to  shed  some  light  on  the 
applicability  of  these  robust  inference  procedures  relates  to  the  estimation 
cf  a  simple  AR(1)  model.   Under  the  assumption  of  conditionally  homoskedastic 
normal  errors,  the  QMLE  for  q>.    and  8   are  given  by  OLS .   The  parameter  sets 
for  these  first  eight  models  are  listed  in  table  1  as  models  1  through  8. 
For  ease  of  comparison  the  unconditional  variance  is  set  equal  to  one  for  all 
of  the  models.   However,  the  conditional  variance  and  the  degree  of 
leptokurtosis  in  the  conditional  t-distribution  varies  across  the  models. 
The  normally  distributed  random  variables  were  generated  by  the  IMSL 


L9 


subroutine  GGNML.   The  t   distributed  random  variables  were  formed  as  (v-2) 

v 

2 
times  a  N(0,1)  random  variable  divided  by  the  square  root  of  a  x     variate 

generated  by  the  IMSL  subroutine  GGAMR.   All  of  the  estimates  in  tables  2 

through  5  are  base  on  10,000  replications.   The  simulated  sample  mean  and 

A  A 

variance  for  4>^    and  5  are  reported  in  table  2.   From  the  table  the  well  known 
small  sample  downward  bias  in  the  least  squares  estimates  for  4>^    is  seen  to 
be  of  relatively  minor  order  in  the  present  context  with  100  or  200 
observations;  see  Sawa  (1978)  for  an  analytical  expression  of  the  bias  with 
conditionally  normal  errors.   However,  it  is  interesting  to  note  that  the 
bias  is  a  slightly  increasing  function  of  the  degree  of  heteroskedasticity 
and  leptokurtosis  in  the  conditional  distribution.   Similarly,  the  presence 
of  conditional  heteroskedasticity  and/or  leptokurtic  errors  raises  the 
variability  of  the  estimates.   This  is  especially  true  for  the  variance 
estimates,  where  the  degree  of  variability  more  than  doubles  when  moving  from 
the  normal  to  the  t   distribution.   Also,  note  that  for  the  AR(2)  models  5 
and  6,  the  mean  sample  estimates  for  ^1  from  the  misspecified  AR(1) 
regression  are  equally  biased  below  the  true  first  order  autocorrelation, 
^/(W2)  =  .588. 

The  small  sample  distribution  of  the  QMLE's  for  6..    are  further 
illustrated  in  table  3,  where  the  empirical  fractiles  are  listed  for  three 
different  wTald  statistics  when  testing  a  true  null  hypothesis  for  i    .      The 
exact  form  of  the  RB  statistic  refers  to  the  robust  form  of  the  test 
statistic  as  given  in  equation  (2.8)  in  section  2.   This  form  of  the  test  is 

compared  to  the  standard  Hessian  (HE)  based  wald  test,  which  relies  on  the 

A-l 
inverse  or  the  quasi- information  matrix,  A   ,  as  an  estimate  for  the  variance 

A-l 

of  4>^  .      The  OPG  based  test  uses  the  matrix  B   to  estimate  the  variance  of 
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<f>    .      Both  the  HE  and  OPG  tests  are  used  regularly  in  the  literature. 

For  each  model,  table  3  reports  the  proportion  of  the  replications  that 

2 
fall  below  the  .900,  .950,  .975,  and  .990  fractiles  in  the  x   distribution. 

With  10,000  replications  a  consistent  estimate  of  the  variance  of  the 

A  A        A 

empirical  fractiles,  say  p,    is  given  by  p(l-p)/10,000. 

For  models  1,  2,  and  3  with  conditionally  homoskedastic ,  normally 

A 

distributed  errors,  the  OLS  estimator  <j>      is  equal  to  the  MLE,  and  all  three 
tests  are  asymptotically  equivalent  in  this  situation.   Indeed,  from  table  3, 
the  finite  sample  properties  of  the  three  different  tests  is  seen  to  be  very 
similar  for  models  1,  2,  and  3,  with  the  actual  size  being  very  close  to  the 
nominal  size.   However,  with  conditional  heteroskedasticity  the  RB  test  is 
clearly  preferred  to  either  of  the  other  two  tests.   Both  the  HE  and  OPG 

a 

estimators  systematically  underestimate  the  standard  error  of  <f>.  ,  yielding  an 

2 
empirical  distribution  considerably  more  dispersed  than  a  x-i  •   F°r  instance, 

A 

from  table  2,  the  mean  sample  standard  deviation  for  4>      from  model  8  equals 
.128,  whereas  the  sample  mean  standard  deviations  as  estimated  by  RB,  HE,  and 
OPG  are  found  to  be  .113,  .087,  and  .071,  respectively.   Moreover,  this 
ordering  of  the  three  test  statistics  remains  the  same  across  models  4,  7, 
and  8.   With  conditional  heteroskedasticity  the  empirical  size  of  the  HE  and 
OPG  tests  are  much  larger  than  the  nominal  size.   For  example,  from  model  8 
using  a  five  percent  test,  the  probability  of  a  type  I  error  for  the  OPG  test 
equals  .262.   It  is  worth  noting  that  bias  correcting  the  estimators  for  <p. 
does  not  alter  this  conclusion.   None  of  the  empirical  fractiles  for  the  bias 
corrected  test  statistics  deviates  by  more  than  .005  from  the  results 
reported  in  table  3.   For  reasons  of  space  the  bias-corrected  estimates  are 
not  reported. 
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Having  estimated  a  simple  constrained  model,  it  is  desirable  to  test  for 
deviations  from  that  specification.   Table  4  reports  the  simulation  results 
for  the  three  different  LM-type  test  statistics  discussed  in  section  3 
designed  to  test  the  null  hypothesis  of  an  AR(1)  model  versus  an  AR(2) 

specification,  i.e.  <j>„   -  0  versus  <f>     *   0.   From  Theorem  3.1  the  robust  form 

2  -  ■■ 

of  the  LM  test,  RB,  can  be  calculated  as  TR  from  a  regression  of  1  on  e  e   .. 

where  e   is  the  residual  from  the  AR(1)  regression  and  e   ,  denotes  the 
t  v     &  t-1 

residual  from  the  regression  of  e   1  on  y   .;  see  also  Wooldridge  (1987).   In 

2 
implementing  the  TR  test  statistic  used  throughout  this  section  we  replace  T 

with  the  actual  number  of  observations  used  in  the  auxiliary  regression;  here 

T-1.   The  HE  Lagrange  Multiplier  test,  given  by  regression  (3.9),  is  readily 

2  - 

evaluated  as  TR   from  the  regression  of  e   on  y   .,  and  e   .,  .   This  test  has 
u  6  t    Jt-1       t-1 

2 
been  widely  applied  in  the  literature.   The  OPG  test  is  obtained  as  TR   from 

the  regression  of  1  on  e  y  -  and  e  e   ,  ;  this  regression  corresponds  to 
&  tJt-l      t  t-1'        6  v 

equation  (3.6)  in  section  3.   All  three  of  the  tests  extend  in  a 
straightforward  way  to  higher  orders  of  serial  dependence  by  including 
additional  lags  of  e   or  the  relevant  cross  products  in  each  of  these 

regressions . 

Again,  with  conditionally  homoskedastic  errors,  the  three  tests  are 
asymptotically  equivalent.   From  models  1  and  2  in  table  4  this  carries  over 
to  finite  samples,  where  the  actual  size  is  close  to  the  nominal  size  for  all 
three  tests.   The  power  properties  of  the  three  tests  are  also  very  similar 
in  this  situation  as  seen  by  examing  the  results  for  model  5.   The 
theoretical  results  are  therefore  substantiated  by  the  Monte  Carlo  evidence; 
the  robust  test  is  not  only  asymptotically  equivalent  to  the  more  traditional 
forms  of  the  test  even  when  the  auxiliary  assumption  -  in  this  case 
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homoskedasticity  -  is  satisif ied,  but  in  finite  samples  the  three  tests 
perform  very  similarly  under  ideal  conditions. 

When  conditional  heteroskedasticity  is  present  the  actual  size  for  both 
the  HE  and  OPG  tests  are  significantly  higher  than  the  nominal  size.   This  is 
in  accordance  with  the  findings  in  Diebold  (1986)  and  Domowitz  and  Hakkio 
(1988).   Interestingly,  the  HE  LM  test  is  more  adversely  affected  by 
heteroskedasticity  than  the  OPG  LM  test,  whereas  the  results  in  table  3 
indicate  that  the  HE  Wald  test  is  more  robust  than  the  OPG  Wald  test. 
Further,  when  comparing  models  7  and  8  the  shape  of  the  distribution  is  also 
seen  to  be  important  for  the  size  of  the  HE  and  OPG  tests.   In  contrast,  the 
RB  test  is  indeed  robust  to  both  heteroskedasticity  and  leptokurtosis . 
However,  comparing  the  results  from  models  5  and  6  indicates  that  the  power 
of  the  robust  test  decreases  with  departures  from  homoskedasticity.   This 
finding  is  not  very  suprising  since  RB  is  optimal  under  homoskedasticity  and 
normality. 

The  next  set  of  results  relates  to  the  performance  of  the  same  three  LM 
tests  when  testing  for  ARCH(l)  errors,  i.e.  a..  -  0  versus  a..  >  0.   In  this 

situation  the  second  regression  for  the  RB  test  described  in  Theorem  3.1  is 

-2   -  -2 

simply  equal  to  a  regression  of  1  on  (e   -  6)(e4_        -    6);    see  also  Wooldridge 

t       t  - 1 

(1988).   The  "studentized"  version  of  the  Breusch  and  Pagan  (1979)  and 
Godfrey  (1978)  LM  statistic  is  also  easy  to  compute  in  this  situation.   This 
HE  version  of  the  LM  test  has  been  advocated  by  Engle  (1984),  Hall  (1984), 


and  Koenker  (1981),  among  others.   As  shown  in  Engle  (1982b),  the  test  for 

-2 

!t-r 


2  -2         -2 

first  order  ARCH  is  given  by  TR  from  the  regression  of  e„_  on  1  and  e^  ..  , 


2 
where  R   is  the  centered  r- squared.   Because  of  its  computational  simplicity 

this  LM  diagnostic  has  already  found  wide  use  in  applied  time  series 


2  3 


2 

econometrics.   Finally,  the  OPG  test  for  ARCH(l)  takes  the  form  TR   from  the 

j  <  u 

_2   ~      -2   ~2 
regression  1  on  (e   -  6)    and  e   1 (e  -    S) .      All  tests  extend  readily  to 

checking  for  higher  orders  of  ARCH  by  including  additional  lags  in  the 

auxiliary  regressions. 

As  can  be  seen  from  table  5,  the  results  for  the  RB  test  are  less 
encouraging  in  this  situation.   For  all  three  tests  the  actual  size  is 
different  from  the  nominal  size.   The  RB  and  OPG  tests  have  larger  actual 
size,  whereas  the  HE  test  is  conservative.   For  both  the  HE  and  OPG  test  this 
difference  is  more  pronounced  with  leptokurtic  errors.   As  expected,  the  size 
of  RB  does  not  differ  between  models  1  and  2.   However,  the  power  properties 
of  the  RB  test  are  rather  poor.   Although  the  HE  test  is  conservative,  it 
clearly  outperforms  both  the  RB  and  OPG  tests  in  terms  of  power.   Table  5 
suggests  that  the  RB  test  is  not  asymptotically  equivalent  to  the  HE  and  OPG 
tests  against  nonlocal  alternatives.   It  is  also  interesting  to  note  that  the 
power  of  all  of  the  tests  decreases  dramatically  with  the  degrees  of  freedom 
parameter  v.      For  instance,  using  a  nominal  size  of  five  percent,  the 
probabality  of  a  type  II  error  for  the  HE  test  differs  by  .183  between  models 
7  and  8.   The  results  in  table  5  are  in  accordance  with  the  findings  in 
Engle ,  Hendry,  and  Trumble  (1985),  where  a  one  sided  version  of  the  HE  test 
is  compared  to  a  modified  Wald  type  test  from  the  auxiliary  HE  regression. 
For  small  deviations  from  the  null  hypothesis  the  HE  test  is  much  more 
powerful  than  the  Wald  type  test.   See  also  Bollerslev  (1988)  and  Milh©j 
(1987). 

We  should  emphasize  that  we  have  not  presented  evidence  on  the  behavior 
of  the  three  statistics  for  ARCH(l)  when  conditional  homoskedasticity  holds 
but  the  conditional  fourth  moment  is  nonconstant.   An  example  would  be 
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conditionally  t-distributed  errors  with  constant  variance  but  with  degrees  of 
freedom  depending  on  x  .   In  this  situation  the  RB  test  will  retain  the 
appropriate  asymptotic  size  while  HE  and  OPG  will  generally  have  the  wrong 
size.   We  conjecture  that  this  carries  over  to  the  finite  sample  properties 
of  the  tests  as  well. 

Tables  6  through  10  refer  to  the  estimation  and  testing  results  obtained 
with  a  nonlinear  AR(1) -GARCH(1 , 1)  model.   The  parameter  sets  used  in 
characterizing  the  different  models  are  given  in  table  1  as  models  9  through 
15.   The  values  for  the  GARCH  parameters  reflect  the  estimates  reported  in 
the  literature.   With  high  frequency  financial  or  monetary  data,  the 

A 

estimates  for  a..  +  /3..  are  typically  very  close  to  one,  with  a.,  in  the  range 
from  .1  to  .2;  see,  for  example,  Baillie  and  Bollerslev  (1988),  Bollerslev 
(1987),  and  Engle  and  Bollerslev  (1986).   The  QMLE  for  6      -    (4> .,J,a  0   ) 
under  the  assumption  of  conditional  normality  was  found  by  maximizing  the 
quasi-log  likelihood  in  (2.4)  through  a  combined  grid  search  and  a  standard 
iterative  procedure  based  on  the  Berndt,  Hall,  Hall,  and  Hausman  (1974) 
algorithm.   The  convergence  criterion  was  taken  as  an  r- squared  less  than 
.001  in  the  BHHH  updating  regression.   Some  preliminary  analysis  suggested 
that  very  similar  results  can  be  expected  using  a  more  stringent  convergence 
criterion.   Due  to  the  computer  intensive  nature  of  the  estimation  procedure, 
the  results  reported  in  tables  6  through  10  are  calculated  from  1,000 
replications  only. 

The  sample  mean  and  standard  deviations  for  the  QMLE's  for  the  seven 
different  AR(1) -GARCH(1 , 1)  data  generating  mechanisms  are  reported  in  table 
6.   The  small  sample  bias  in  the  estimates  for  i.    is  very  similar  to  the  bias 
obtained  when  estimating  the  AR(1)  model  under  the  assumption  of  conditional 
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horaoskedasticity ,  and  the  small  sample  gain  in  terms  of  efficiency  is 
relatively  minor.   For  instance,  for  model  9  the  sample  mean  and  standard 

A 

deviation  for  <£..  from  the  correctly  specified  AR(1)  -GARCH(1 , 1)  estimation 
equals  .493  (.066),  compared  with  .493  (.075)  for  the  AR(1)  model  estimated 
under  the  assumption  of  conditional  homoskedasticity . 

Turning  the  the  GARCH(1,1)  parameters,  table  6  shows  that  the  MLE 

A 

estimator  of  a      is  essentially  unbiased,  whereas  /91  is  somewhat  downward 
biased.   As  shown  in  Bollerslev  (1988),  the  GARCH(1,1)  model  is  readily 
interpreted  as  an  ARMA(1,1)  model  for  conditional  second  moments  with 
autoregressive  and  moving  average  parameters  a..  +  /L  and  -p    ,  respectively. 
In  the  ARMA  formulation  both  parameters  show  a  bias  toward  zero  in  small 
samples.   Along  these  lines  it  is  interesting  to  note  that  Engle ,  Hendry,  and 
Trumble  (1985)  observed  a  downward  bias  in  the  MLE  estimators  for  a   in  a 
simple  ARCH(l)  model. 

With  leptokurtic  conditional  errors  the  bias  and  the  variability  in  the 
QMLE's  obtained  under  the  conditional  normality  assumption  is  slightly  larger 
than  the  corresponding  values  for  the  MLE's  with  normal  errors.   Not 
surprisingly,  both  the  biases  and  the  variability  for  the  correctly  specified 
AR(1) -GARCH(1 , 1)  model  decrease  with  the  sample  size.   Also,  the  biases  for 

A 

«..  from  models  12  and  13  are  comparable  to  to  biases  reported  in  table  2  for 
the  misspecifed  (in  mean)  homoskedastic  AR(1)  models.  Similarly,  the  biases 
in  the  estimators  for  a     and  0.    from  models  14  and  15  are  as  expected. 

A 

The  small  sample  behavior  of  6.    is  further  illustrated  in  table  7,  where 
the  empirical  fractiles  for  the  RB,  HE,  and  OPG  tests  for  the  true  null 
hypothesis  are  tabulated.   Whereas  conditional  heteroskedasticity  was  seen  to 
render  the  nominal  size  for  the  HE  and  OPG  tests  invalid,  as  might  be 
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expected  all  three  tests  perform  well  when  the  heteroskedasticity  is 
correctly  modeled.   As  was  the  case  for  the  Wald  tests  in  table  3,  the 
presence  of  leptokurtic  errors  does  not  appreciably  alter  the  size  of  any  of 
the  three  tests. 

In  contrast,  when  testing  H  •  a  -  .15,  the  three  different  covariance 
matrix  estimators  are  seen  to  lead  to  very  different  results.   From  table  8, 
with  conditional  normal  errors  but  only  200  observations,  the  actual  size  of 
the  RB  test  is  much  closer  to  the  nominal  size  than  than  the  sizes  for  the  HE 
and  OPG  tests.   Moreover,  although  the  HE  test  seems  to  perform  reasonably 
well  with  a  conditional  t   distribution  and  only  200  observations,  the 
results  for  model  11,  with  400  observations,  make  it  clear  that  that  the 
asymptotic  size  of  the  HE  test  is  much  larger  than  its  nominal  size.   This  is 
also  true  of  the  OPG  test,  where  a  five  percent  test  for  model  11  actually 
results  in  a  probality  of  type  I  error  of  .211. 

In  table  9  the  finite  sample  distribution  of  the  three  LM  statistics  for 
testing  the  AR(1) -GARCH(1 , 1)  model  versus  the  AR(2) -GARCH(1 , 1)  model  are 
reported;  specifically,  H  *  4>„  ■=  0   versus  H .:  <$>„   *   0.   The  two  regressions 
for  the  Pv£  test  can  be  deduced  from  Theorem  3.1.   Because  the  RB  statistic  is 
intended  to  be  robust  against  nonnormality ,  the  procedure  is  more  complicated 
than  it  would  be  if  the  statistic  were  only  to  be  robust  to  second  moment 
misspecif ication;  nevertheless,  computation  is  quite  simple.   Note  that  the 

second  regression  from  Theorem  3.1  contains  perfect  multicollinearity ;  in 

2  ..  .. 

calculating  R  only  the  last  element  in  rj'^A      is  used.   This  corresponds  to 
u  t  t 

the  only  nonredundant  element.   The  HE  test  that  exploits  block  diagonality 

2 
of  the  information  matrix  for  AR(2)  errors  was  calculated  as  TR  from  the 

u 

regression  e  w_   on  y   „w_   and  e   u      .   Finally,  the  OPG  statistic  is 
t  l.      t-zt       t-lt 
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2 
obtained  as  TR   of  1  on  the  1x5  quasi-score  evaluated  at  the  estimates  under 
u 

the  null,  i.e.  s  (?_) ,  where  ?  -  ($L  ,0,5,5.,  ,/L)  •   From  the  table,  the 

2 
fractiles  of  the  X-.  distribution  provide  a  good  approximation  to  the  small 

sample  size  for  all  three  statistics,  although  the  OPG  test  is  slightly  more 

dispersed.   Correcting  for  this  difference  in  size  yields  similar  power 

estimates  across  the  three  tests,  as  seen  from  models  12  and  13. 

The  final  table  gives  the  distribution  of  the  LM  test  for  the 

AR(1)-GARCH(1,1)  versus  the  AR(1) -GARCH(1 , 2)  model,  i.e.  a     -   0  versus  a     * 

0.   The  calculation  of  the  RB   and  OPG  tests  follow  the  same  recipes  as  the 

corresponding  tests  in  table  9.   Note,  however,  that  no  simple  form  of  the  HE 

test  is  readily  available,  as  the  regression  (3.9)  involves  the  derivative  of 

the  conditional  variance,  which  takes  a  recursive  form;  see  appendix  B.   From 

the  table,  the  actual  size  of  both  the  RB  test  and  the  HE  test  is  in 

accordance  with  the  nominal  size,  whereas  the  OPG  test  rejects  far  too  often. 

This  difference  in  size  for  the  OPG  test  corresponds  to  the  findings  reported 

in  table  5  for  the  LM  tests  for  ARCH(l)  models  in  the  homoskedastic  AR(1) 

model.   Also,  the  powers  of  the  RB  and  HE  tests  are  very  similar  and,  as  for 

the  LM  tests  for  ARCH  effects  in  table  5,  the  power  decreases  substantially 

with  the  degree  of  conditional  leptokurtosis ;  cf  models  14  and  15.   Since  the 

calculations  required  for  the  HE  test  are  more  involved  in  this  situation,  a 

2  -2--2         -2   --2 

related  test  computed  from  TR   from  the  regression  (e  u        -    1)  on  e   „w^  has 

u  t  t  t  -  z  t 

been  used  in  practice.   Tnis  simpler  residual  based  diagnostic,  which  is 
equivalent  to  the  test  obtained  by  evaluating  the  derivatives  in  (3.9)  at  /3 
-  0,  and  ignoring  terms  for  the  derivatives  of  the  conditional  variance 
function  with  respect  to  6,    a.,    and  0. ,  leads  to  a  conservative  test.   For 
instance,  for  model  9  the  actual  fractile  corresponding  to  the  nomimal  .950 
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fractile  is  estimated  to  be  .995,  and  for  model  14  the  power  of  the  tests  is 
almost  equal  to  the  nominal  size. 

Interestingly,  the  results  reported  above  indicate  that  the  finite 
sample  distributions  of  the  RB  and  HE  LM  statistics  are  very  similar  when 
both  the  conditional  mean  and  conditional  variance  of  y   given  x   are 
correctly  specified.   This  is  hardly  surprising  as  the  asymptotic 
distribution  of  the  HE  statistic  in  (3.9)  may  be  shown  to  be  robust  to 
symmetric  departures  from  normality  when  the  conditional  fourth  moment  of  the 
errors  are  proportional  to  the  square  of  the  conditional  variance.   If  the 
error  distribution  is  asymmetric  and/or  the  conditional  fourth  moment  of  the 
error  distribution  is  not  proportional  to  the  square  of  the  conditional 
variance,  then  the  HE  LM  test  will  generally  have  the  wrong  size.   To 

illustrate  this,  we  repeated  the  simulations  for  model  10  with  the 

2 
conditional  error  distribution  of  e   generated  as  a  standardized  central  x-.  • 

In  that  situation  the  actual  size  of  the  nominal  .050  RB  LM  test  was  found  to 

be  .057,  whereas  the  actual  sizes  of  the  .050  HE  LM  and  OPG  LM  tests  were 

estimated  at  .016  and  .312,  respectively.   Thus,  with  nonsymmetric  deviations 

from  normality,  the  HE  test  in  (3.9)  can  perform  quite  poorly,  even  when  the 

conditional  mean  and  variance  functions  are  correctly  specified.   At  the  same 

time,  the  actual  size  of  the  RB  test  remains  close  to  its  nominal  size,  as 

predicted  by  the  asymptotic  theory  developed  here. 
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5 .  Conclusion 

Multiple  dynamic  econometric  models  that  jointly  parameterize  the 
conditional  means,  conditional  variances,  and  conditional  covariances  have 
become  increasingly  popular  in  recent  years.   When  conducting  inference  in 
such  models  the  assumption  of  conditional  normality  is  typically  maintained. 
This  auxiliary  assumption  is  difficult  to  justify  from  a  theoretical  point  of 
view,  and  it  is  often  violated  by  the  data.   Nonetheless,  under  fairly 
general  regularity  conditions,  the  consistency  and  asymptotic  normality  of 
the  normal  QMLE  holds  true,  even  when  the  assumption  of  conditional  normal 
errors  and/or  the  conditional  variance  assumption  are  violated.   Furthermore, 
building  on  the  results  of  Wooldridge  (1987,1988)  simple,  formulas  for  the 
corresponding  robust  standard  errors,  along  with  robust  regression-based 
procedures  are  readily  available.   A  Monte  Carlo  study  for  a  set  of 
univariate  AR  time  series  models  with  GARCH  errors  indicates  that  these 
asymptotically  justified  results  carry  over  to  finite  samples.   For  the 
sample  sizes  analyzed  here,  the  biases  in  the  QMLE  are  relatively  minor.   At 
the  same  time,  the  choice  of  covariance  matrix  estimator  plays  an  important 
role  when  actually  conducting  inference.   Wald  tests  based  on  based  on 
estimates  of  the  quasi -information  matrix  or  the  outer  product  of  the  score 
often  lead  to  inference  with  the  wrong  size.   In  contrast,  the  actual  size  of 
the  Wald  test,  derived  from  a  relatively  simple  estimate  of  the  White  (1982a) 
covariance  matrix  estimator,  is  never  very  far  from  the  nominal  size;  this  is 
true  whether  or  not  the  auxiliary  assumptions  hold.   Moreover,  in  most 
situations  the  actual  size  and  power  properties  of  the  robust  LM  procedure 
compare  favorably  to  the  more  traditional  LM  tests.   This  is  particularly 
true  for  the  LM  tests  constructed  from  the  regression  of  unity  on  the  score, 
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for  which  the  nominal  size  of  the  test  can  be  quite  misleading. 

In  summary,  the  additional  matrix  inversion  and  multiplication  required 
for  the  robust  Wald  tests,  and  the  additional  linear  regression  needed  to 
compute  the  robust  LM  statistic,  seem  to  be  small  prices  to  pay  in  order  to 
guard  against  second  moment  misspecif ication  and/or  nonnormality  when 
conducting  inference  in  dynamic  econometric  models. 
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Table  1 
Parameter  Sets  for  the  Simulation  Experiments 


Model 

*1 

*2 

s 

°1 

Q2 

h 

V 

T 

1 

.5 

.0 

1.0 

.0 

.0 

.0 

CO 

100 

2 

.5 

.0 

1.0 

.0 

.0 

.0 

5.0 

100 

3 

.5 

.0 

1.0 

.0 

.0 

.0 

5.0 

200 

4 

.5 

.0 

.05 

.15 

.0 

.8 

5.0 

100 

5 

.5 

.15 

1.0 

.0 

.0 

.0 

CD 

100 

6 

.5 

.15 

.05 

.15 

.0 

.8 

5.0 

100 

7 

.5 

.0 

.6 

.4 

.0 

.0 

CO 

100 

8 

.5 

.0 

.6 

.4 

.0 

.0 

5.0 

100 

9 

.5 

.0 

.05 

.15 

.0 

.8 

CO 

200 

10 

.5 

.0 

.05 

.15 

.0 

.8 

5.0 

200 

11 

.5 

.0 

.05 

.15 

.0 

.8 

5.0 

400 

12 

.15 

.05 

.15 

.0 

.8 

CC 

200 

13 

.5 

.15 

.05 

.15 

.0 

.8 

5.0 

200 

14 

.5 

.0 

.1 

.1 

.2 

.6 

CO 

200 

15 

.  5 

.0 

.1 

_  ]_ 

.2 

.6 

5.0 

200 

Table  2 
Sample  Mean  and  Standard  Deviation  for  QMLE  form  AR(1) 


Model  *1 


1  .492  (.086)  .991  (.141) 

2  .490  (.086)  .992  (.266) 

3  .495  (.060)  .994  (.188) 

4  .486  (.105)  .936  (1.024) 

5  .572  (.095)  1.011  (.147) 

6  .568  (.114)  .992  (1.391) 

7  .484  (.114)  .980  (.270) 

8  .479  (.128)  .971  (.550) 


Table  3 
Finite  Sample  Distribution  of  Wald  Tests  for  $      =  .5  from  AR(1) 

Model  Test  .900  .950  .975  .990 

RB 

1  HE 
OPG 

RB 

2  HE 
OPG 

RB 

3  HE 
OPG 

RB 

4  HE 
OPG 

RB 

7  HE 
OPG 

RB 

8  HE 
OPG 


889 

.940 

.969 

.986 

897 

.950 

.976 

.991 

904 

.954 

.976 

.991 

885 

.939 

.968 

.985 

904 

.951 

.974 

.988 

906 

.951 

.973 

.984 

893 

.945 

.973 

.988 

905 

.954 

.978 

.991 

908 

.953 

.973 

.988 

880 

.931 

.962 

.980 

834 

.900 

.939 

.965 

773 

.844 

.887 

.925 

871 

.929 

.960 

.979 

800 

.865 

.909 

.948 

708 

.782 

.835 

.883 

859 

.915 

.949 

.974 

762 

.835 

.880 

.910 

663 

.738 

.788 

.£34 

Table  4 
Finite  Sample  Distribution  of  LM  Tests  for  AR(1)  versus  AR(2) 

Model  Test  .900  .950  .975  .990 

RB 

1  HE 
OPG 

RB 

2  HE 
OPG 

RB 

4  HE 
OPG 

RB 

5  HE 
OPG 

RB 

6  HE 
OPG 

RB 

7  HE 
OPG 

RB 

8  HE 
OPG 


897 

.951 

.978 

.992 

901 

.949 

.977 

.992 

892 

.948 

.976 

.991 

899 

.950 

.978 

.992 

904 

.954 

.977 

.992 

890 

.944 

.975 

.991 

898 

.957 

.983 

.995 

834 

.900 

.940 

.967 

885 

.949 

.980 

.993 

613 

.734 

.826 

.905 

613 

.731 

.815 

.895 

605 

.726 

.820 

.902 

680 

.794 

.874 

.939 

599 

.704 

.787 

.865 

663 

.779 

.860 

.929 

895 

.951 

.975 

.991 

840 

.908 

.944 

.972 

663 

.779 

.860 

Q?Q 

895 

.952 

.979 

.993 

824 

.897 

.935 

.964 

865 

.935 

.968 

.987 

Table  5 
Finite  Sample  Distribution  of  LM  Tests  for  AR(1)  versus  AR(1) -ARCH(l) 


Model  Test  .900  .950  .975  .990 

RB 

1  HE 
OPG 

RB 

2  HE 
OPG 

RB 

7  HE 
OPG 

RB 

8  HE 
OPG 


860 

.922 

.960 

.986 

924 

.965 

.985 

.994 

828 

.893 

.932 

.965 

855 

.931 

.972 

.993 

950 

.975 

.983 

.990 

757 

.846 

.899 

.945 

525 

.742 

.871 

.954 

308 

.387 

.460 

.548 

347 

.493 

.632 

.777 

746 

.891 

.960 

.991 

501 

.570 

.629 

.702 

566 

.703 

.809 

.891 

Table  6 
Sample  Mean  and  Standard  Deviation  for  QMLE  from  AR(1)-GARCH(1 , 1) 


Model 

h 

S 

"l 

'l 

9 

.493  (.066) 

.085 

(.070) 

.154 

(-061) 

.750  (.112) 

10 

.489  (.071) 

.078 

(.072) 

.158 

(.075) 

.742  (.127) 

11 

.497  (.052) 

.069 

(.046) 

.156 

(.059) 

.762  (.091) 

12 

.579  (.068) 

.096 

(.089) 

.160 

(.069) 

.733  (.138) 

13 

.572  (.076) 

.083 

(.066) 

.162 

(.081) 

.731  (.140) 

14 

.491  (.063) 

.103 

(.050) 

.183 

(.067) 

.686  (.090) 

15 

.494  (.070) 

.106 

(.057) 

.173 

(.079) 

.666  (.112) 

Table  7 

Finite  Sample  Distribution  of  Wald  Tests  for  j»   =  . 5  from  AR(1) -GARCH(1 , 1) 

Model           Test          .900  .950  .975  .990 

RB            .883  .952  .976  .991 

9             HE            .882  .956  .980  .991 

OPG           .893  .959  .980  .990 

RB            .891  .954  .976  .991 

10  HE            .879  .946  .972  .991 
OPG           .886  .942  .970  .987 

RB            .895  .947  .966  .987 

11  HE            .882  .938  .966  .988 
OPG           .875  .935  .959  .983 


Table  8 
Finite  Sample  Distribution  of  Wald  Tests  for  a      =  .15  from  AR(1) -GARCH(1 , 1) 


Model  Test  .900  .950  .975  .990 

RB  .918  .954  .972  .983 

9  HE  .936  .970  .980  .990 

OPG  .955  .974  .986  .991 

RB  .923  .952  .970  .984 

10  HE  .884  .932  .956  .970 
OPG  .829  .884  .916  .944 

RB  .909  .941  .957  .969 

11  HE  .824  .886  .919  .945 
OPG  .699  .789  .848  .887 


Table  9 

Finite  Sample  Distribution  of  LM  Tests  for  AR(1) -GARCH(1 , 1) 

Versus  AR(2) -GARCH(1, 1) 

Model           Test  .900  .950  .975  .990 

RB  .902  .953  .976  .991 

9             HE  .895  .951  .976  .989 

OPG  .878  .941  .966  .982 

RB  .909  .958  .981  .992 

10             HE  .886  .952  .978  .991 

OPG  .861  .925  .956  .980 

RB  .390  .528  .634  .775 

12  HE  .400  .529  .643  .759 
OPG  .350  .481  .578  .688 

RB  .442  .578  .691  .803 

13  HE  .400  .516  .643  .781 
OPG  .389  .506  .603  .707 


Table  10 

Finite  Sample  Distribution  of  LM  Tests  for  AR(1) -GARCH(1 , 1) 

Versus  AR(1) -GARCH(1,2) 

Model  Test  .900  .950  .975  .990 

RB 
9  HE 

OPG 

RB 
10  HE 

OPG 

RB 

14  HE 
OPG 

RB 

15  HE 
OPG 


882 

.948 

.975 

.992 

895 

.952 

.976 

.993 

845 

.906 

.946 

.972 

902 

.954 

.985 

.995 

899 

.957 

.986 

.994 

770 

.839 

.893 

.937 

586 

.722 

.826 

.911 

588 

.692 

.799 

.873 

518 

.631 

.723 

.823 

713 

.822 

.908 

.970 

734 

.823 

.903 

.961 

587 

.680 

.744 

.819 

Appendix  A 


Conventions :   If  A(0)  is  an  NxM  matrix  depending  on  the  Pxl  vector  8,    and 

a..  (8)    is  the  (i,j)th  element  of  A(8)  ,    then  the  derivative  of  A  with  respect 

to  8,   ,  V  A(6),    is  the  NxM  matrix  with  (i,j)th  element  da. .(8)/88.  .      Also, 
k        8^  ij       k 

the  "derivative"  of  A(0)  with  respect  to  8    is  the  NMxP  matrix 

V  A(0)  -  [vec  V  k(8)    ...  V  A(8)}. 

8  e1  8? 

For  any  differentiable  Lxl  function  b(0),  define  the  second  derivative  of  b 
to  be  the  LPxP  matrix 


LEMMA  A.l:   Let  A  be  a  KxK  positive  definite  matrix.   Then 
(a.l)  log  |A|  <  tr  (A  -  IR) 

with  equality  holding  if  and  only  if  A  -  I  . 
PROOF:   By  Rao  (1973,  Exercise  20.2(c),  p. 74), 

|A|  <  an---aKK 

with  equality  holding  if  and  only  if  A  is  diagonal.   Hence, 

K  K 

(a. 2)  log  |A|  <  I   log  a   <  I   (a   -  1) 

j=l     JJ   j=l  ^ 

~     fa   -  K-tr  (A-  IK). 
j-1  JJ 

The  first  inequality  in  (a.l)  is  strict  unless  A  is  diagonal,  and  the  second 
is  strict  unless  a..  -  1  for  all  j-l,...,K.   Thus,  the  inequality  in  (a.l)  is 

strict  unless  A  =  I 

K 
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LEMMA  A. 2:   Let  y  be  a  Kxl  random  vector  with  finite  second  moments,  and  let 

u      ^  E(y) ,  S  ^  V(y) .   Define  the  functions 
o  o 

q(y;/»,S)  -  log  |Z|  +  (y  -  /i)'S-1(y  -  fi) 
q(/»,S)  -  E[q(y;/i,S)] 

for  n   e  R  ,  E  a  positive  definite  KxK  matrix.   Then  q  is  uniquely  minimized 

by  («  ,2  ). 

o      o 
PROOF:      Straightforward  algebra   shows   that 

qOi.S)    -   log    |S|    +   tr  zhQ  +    (Mq    -    /O'S*1^    -    /*)  • 

Therefore,  q(p,Z)  >  q(/j  ,Z)  for  any  a»  *  A*  and  a^y  p.d.  matrix  E.   It  remains 
to  show  that  q(u  ,  E)  >  q(u  ,  E  )  for  any  p.d.  matrix  E  ^  E  ,  i.e. 
log  ISI  +  tr  2  2  >  log  IS  I  +  tr  I„ 

Oil  0  O      I       0  I  ^ 


or 


log    [S-12    |    <  tr    (2_1Z     -    I   ). 

O  O  C\ 


-l/2_   -1/2 
But  this  follows  from  Lemma  A.l  by  setting  A  ■  E   Z  E    and  using  the 

commutativity  of  the  determinent  and  trace  operators. 


Conditions  A.l: 

(i)  6  is  compact  and  has  nonempty  interior;  8      e  int  6. 

(ii)  u  ( •  ,  8)    and  Q  ( •  ,  8)    are  measurable  for  all  8   €   6,  and  u  (x  , • ) 

t  t  t   t 

and  ft  (x  , ■ )  are  twice  continuously  differentiable  on  int  6  for  all  x  . 

t   t  t 

ft^Cx^.tf)  is  nonsingular  with  P  -probability  one,  for  all  8   G  6. 

(iii)  (a)  [1(0):  t-1,2,...}  satisfies  the  UWLLN  (see  Wooldridge 
(1988,  Definition  A.l)) . 

(b)  6      is  the  identifiably  unique  minimizer  (see  Bates  and 

White  (1985))  of 

T 
T"1  I   E[log  |0t(*)|  +  (yt-  Ht(8))n't1(8)(yt   -   ^(6))']. 


it  3 


(iv)  (a)  (h  (0)}  and  (a  (0))  satisfy  the  UWLLN. 

-1  T 
(b)  {A,^  ■  T   Y,   Eta  ('  )15  is  uniformly  positive  definite 

t-1    t     ° 


T 

I 
t-1 


1  T 

(v)  (a)  {B°  -  T"   £  E[st(0o)'st(0o)]}  is  uniformly  p.d. 


(b)  B°-1/2T-1/2  [  .t(lo)'   ^  N(0,Ip). 
(vi)  (s  (9)'  s    (0)}  satisfies  the  UWLLN. 

PROOF  OF  THEOREM  2.1:   First,  application  of  Lemma  A. 2  demonstrates  that  0 
is  a  maximizer  of  E[l  (0) |x  ]  for  all  x  ,  t— 1,2 Consequently,  0   is  a 

maximizer  of 

1  T 
T    I   E[l  (*)]• 
t-1 

Following  standard  practice,  we  strengthen  this  conclusion  by  assuming  that 
0   is  identif iability  unique.   This,  combined  with  the  assumption  that 
{1(0)}  satisfies  the  UWLLN,  establishes  the  weak  consistency  of  the  QMLE 
under  (i) ,  (ii)  ,  and  (iii)  ((i)  and  (ii)  are  actually  much  stronger  than 
needed  for  consistency).   Next,  the  score  is  seen  to  be 

St(0)'  =  V^t(0)'Q^1(0)et(0)' 

+  1/2  V  0  (BY  {Cl'l(S)    ®  O;1(0)]vec[e  (0)'e  (0)  -  0(0)]. 

V       L  L  t  L  L  U 

Differentiation  shows  that  the  Hessian  of  1   can  be  expressed  as 
\W)   -  -a^(0)  +  c  (0), 

where  a  (0)  is  given  by  (2.7)  and  E[c  (0  )|x  1  -  0.   Because  c  (0  )  has  mean 
t       b  J  l    t  o  '  t  to 

zero  it  can  be  omitted  when  estimating  E[hj_(0  )].   A  standard  mean  value 
expansion  yields 

7T(0T  -  0o)  -  [-HT]"1T"1/2ST(0o)    w.p.a.l., 
where  H„,  is  the  Hessian  of  L_/T  evaluated  at  mean  values.   Assumption  (iv) 
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and  the  fact  that  a  (6    )  +  E[h  (0  ) |x  ]  -  0  imply  that 

-ht  -  aJ  5  o 

by  Wooldridge  (1988,  Lemma  A.l).   Combined  with  (v) ,  this  shows  that 

7t(*t  -  en )  -  a°~V1/2s  (u  )  +  od). 

1  o  1  lop 

By    (v)    and   the    asymptotic    equivalence    lemma, 

MeT  -  eo)  5n(o,a^-1b^-1). 

A  A 

Finally,  the  consistency  of  A,^  for  A,^  and  of  B   for  B   follow  from  (iv.a)  and 
(vi) ,  respectively,  upon  application  of  Wooldridge  (Lemma  A.l). 


4  5 


Appendix  B 

In  the  notation  of  section  2,  the  AR(2) -GARCH(1 , 2)  model  in  (4.1)  is 
written  as 

(b.l)      E(yt|xt)  -Mt(*0)  -  *ol7t>1  +  ^o2yt.2 

(b.2)       V(yt|xt)  -  »t(#o)  -  So   +  aole2t.l(.o)  +  «o2^.2<*0)  +  /»ol«t.l*'o>  • 
where 

«t(»)  E  yt  -  /*t(*>, 

6   m    (<f>      <f>      6,a     a     0   )  ,    and  0   ■  (<f>        <f>        6    ,a        a        p      )  .      Straightforward 
VI  ill  o     ol   o2   o   oi   oz   ol 

differentiation  of  the  conditional  mean  in  (b.l)  yields  the  1x6  vector 


(b.3) 


Vt<"  -  (yt.i.yt.2-°-0-0-0)- 


The  derivative  of  the  conditional  variance  function  is  given  by  the  recursive 

formula 

"  -2QlVl<*)yt_2  -  2a2V2(*)yt_3 

-2Vt-l(e)yt-3  -  2Vt-2(^t-4 

1 


(b.4) 


Vt(,)' 


«t-i<'> 


C   A6) 


Vl(«) 


+  Wm(,)' 


The  derivatives  for  any  model  nested  within  the  AR(2) -GARCH(1,2)  model  can  be 
found  by  simply  fixing  the  relevant  parameters  at  zero  and  deleting  the 
corresponding  redundant  elements  in  (b.3)  and  (b.4). 
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